Using Index Structures for Anytime Stream Mining

نویسندگان

  • Philipp Kranen
  • Thomas Seidl
چکیده

Stream data mining has gained a lot of attention over the last years due to an abundance of streaming data in professional as well as personal applications. Solutions have been proposed for many mining tasks such as clustering, classification, frequent item set mining and aggregation. Stream mining is especially challenging due to the large (usually endless) amount of data and the time constraints posed by the stream’s arrival rate. We recently presented an indexbased solution for anytime stream classification that handles both large amounts of data and arbitrary arrival times. In this paper we present our ongoing work, wherein we investigate bulk loading strategies to improve the classification accuracy w.r.t. anytime constraints. We show promising results and discuss future challenges related to index-based classification on data streams. Furthermore we discuss extensions of our technique to other data mining tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-Speed Data Stream Mining using VFDT

Large databases that grow without limit at a rate of several million records per day and to mining these continuous data streams brings unique opportunities to the researchers. Here we describe and evaluate VFDT, an anytime system that builds decision trees using constant memory and constant time per example. VFDT can incorporate tens of thousands of examples per second. It uses Hoeffding bound...

متن کامل

Application of Data-Mining Algorithms in the Sensitivity Analysis and Zoning of Areas Prone to Gully Erosion in the Indicator Watersheds of Khorasan Razavi Province

Extended abstract 1- Introduction Gully erosion is one of the most important sources of sediment in the watersheds and a common phenomenon in semi-arid climate that affects vast areas with different morphological, soil and climatic conditions. This type of erosion is very dangerous due to the transfer of fertile soil horizons, and the reduction of water holding capacity also is a factor for s...

متن کامل

Incrementally Optimized Decision Tree for Mining Imperfect Data Streams

The Very Fast Decision Tree (VFDT) is one of the most important classification algorithms for real-time data stream mining. However, imperfections in data streams, such as noise and imbalanced class distribution, do exist in real world applications and they jeopardize the performance of VFDT. Traditional sampling techniques and post-pruning may be impractical for a non-stopping data stream. To ...

متن کامل

Identification of Ti- anomaly in stream sediment geochemistry using of stepwise factor analysis and multifractal model in Delijan district, Iran

In this study, 115 samples taken from the stream sediments were analyzed for concentrations of As, Co, Cr, Cu, Ni, Pb, W, Zn, Au, Ba, Fe, Mn, Sr, Ti, U, V and Zr. In order to outline mineralization-derived stream sediments, various mapping techniques including fuzzy factor score, geochemical halos and fractal model were used. Based on these models, concentrations of Co, Cr, Ni, Zn, Ba, Fe, Mn, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009